refactor of calculateInterVsIntra to reduce complexity and increase p… by openpaul · Pull Request #47 · immcantation/scoper

openpaul · 2026-04-09T15:50:06Z

Hello,

After having looked into the pairwiseDist code (immcantation/alakazam#133) I had a look where in scoper this code is used.

I profiled the hierarchicalClones function with profvis and found that a lot of time is spend in the calculateInterVsIntra function, which is where the pairwiseDist function is called. Noticably the pairwiseDist call is not the most time consuming step though. From my trace I could see a lot of GC happening, but it did not directly show what calls caused this.

As such I tried multiple refactors of the code and came up with this solution.

The function has the same signature as before and produces identical output to the previous version. At least as much as my tests have shown me. Tbh the old code was a bit hard to understand, but I hope I got there in the end.

I tried to not use more memory, and keep the code simple. I still have for loops but I removed many paste0 calls, which I think is the cause for the speedup.

I ran a small dataset end to end with hierarchicalClones and see a drop from 35 seconds to 15 seconds compute time. In a larger test before the change it took 12 minutes, after only 1.1 minutes, which is a lot more than I expected. All the more reason to be critical of this PR.

I checked and the clone ids between both runs are identical, but I would appreciate a solid review of the code to make sure I did not introduce any bugs.

I think this function is also used by spectral clustering, but I did no test that path yet. That's why I decided to open this PR before as a draft for now. But thought the speedup is worth sharing. And I hoped that this is interesting enough to get feedback before I continue sinking time into this.

I hope I did not overlook any hidden assumptions of the previous code. The immcantation framework is quite cool but also very complex to disentangle as an outsider :)

All the best

…erformance

openpaul · 2026-04-13T17:14:26Z

Closing for now as speed differences are inconsistent and I want to test some more.

openpaul added 2 commits April 9, 2026 16:38

refactor of calculateInterVsIntra to reduce complexity and increase p…

23adf01

…erformance

refactor with lower memory usage according to bench:mark

24458e1

ssnn-airr requested a review from ggabernet April 13, 2026 17:13

openpaul closed this Apr 13, 2026

openpaul deleted the calculateInterVsIntra_refactor branch April 13, 2026 17:14

openpaul restored the calculateInterVsIntra_refactor branch April 13, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor of calculateInterVsIntra to reduce complexity and increase p…#47

refactor of calculateInterVsIntra to reduce complexity and increase p…#47
openpaul wants to merge 2 commits into
immcantation:masterfrom
openpaul:calculateInterVsIntra_refactor

openpaul commented Apr 9, 2026

Uh oh!

openpaul commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

openpaul commented Apr 9, 2026

Uh oh!

openpaul commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant